Raison d’être

GCcollab logo

This book is prepared as part of the R4GC Community skill enhancing and knowledge gathering exercise.

It aims at consolidating the knowledge-based that is being gathered by the R4GC Community spread across various R4GC community portals and discussions.

It also serves to illustrate - as always with source code - one of the most powerful features of R, which is the collaborative peer-reviewed development of data science codes and reports using R Markdown.

Contributors

R4GC is a collaborative effort of many people who have contributed to the development of the knowledgebase that is gathered in this book. They are listed below.

Jonathan Dench, Joseph Stinziano, Henry Luan, Eric Littlewood, Philippe-Israel Morin, Tony Machado, Maxime Girouard, Martin Jean, Tim Roy, Mehrez Samaali, Dejan Pavlic, Utku Suleymanoglu, Alex Goncharov.

Additionally, much support has been also received from wider international R community through stackoverflow.org portal and knowledge-sharing events organized by the RStudio, as well as from several other Government of Canada employees who remained anonymous.

Their help is greatly appreciated.

Key principles

Open data, open source, open science

This book contains only the information and knowledge that was obtained from public domain.

It is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License, and is (and will always be) free to use

Chatham House Rule

This book is prepared using the Chatham House rule. The Chatham House Rule helps create a trusted environment to understand and resolve complex problems through dialog and timely open communication. Its guiding spirit is: share the information you receive, but do not reveal the identity of who said it. Hence, no attributions are made and the identity of speakers and participants is not disclosed. It is based on the views and codes contributed by community members as part of onging community events interactions. Offered as a means to facilitate the discussion, the document does not constitute an analytical document, nor does it represent any formal position of any organisation involved.

History

R4GC Community (formerly called “Use R!” GCCollab community) was created in March 2021 to bring together the R users across the Government of Canada. Here we gather and curate the knowledgebase related to the use of R within the Government of Canada. Everyone is welcome to join, whether you are an advanced R user, just starting learning it, or simply want to learn more about data science and how it is done.

The idea to create this group came after the GC Data2021 Conference Data Literacy Fest workshop on Data Engineering Challenges and Solutions: Demonstration of Shiny. The highest voted question during the discussion there was : “How can I get more help for our members to enhance their knowledge and”spread the word" and raise more awareness regard to this tool?" The creation of this community group is the answer to this question.

By November 2021, the R4GC GCCollab group has been become one of the largest active data science practitioners groups in Canada, counting over a quarter thousand of members. The weekly “Lunch and Learn Data Science with R” meetups organized by the R4GC Community have been attended by data practitioners from over twenty government departments, and generated hundreds of questions/answers, a dozen of tutorials, multiple open to use applications, and thousands of line of open code. On October 29, 2021, the work of this group was presented at the 2021 International Methodology Symposium.

This book aims at consolidating all knowledgebase gathered by the R4GC Community at its portals and meetups

Lunch and Learn meetups

"Building advanced Data Science skills using R, together - one meeting at a time!"

These informal meetings are organized weekly during Friday lunch time (from 12:05 to 12:55). There data scientists wanting to upscale their knowledge of R and other Data Science related subjects get together to show and discuss their R codes and share their data coding tricks and methodologies. Normally, each session is focused on a particular subject or project with the codes shared on GCcode.

No registration is required to join the meeting. However, in order to view the notes and video-recordings from these meetups, you need to join this sub-group: https://gccollab.ca/groups/about/7855030. For Agenda and Dial-in MS Teams numbers please see Group Events page at https://gccollab.ca/event_calendar/group/7391537.

Community portals

The R4GC community makes used of the following colloaborative platforms provided by Share Services Canada.

GCcode group - r4gc

URL: https://gccode.ssc-spc.gc.ca/r4gc

GCcode is the GitLab solution that is accessible from within the GC network. As such, it allows one to view and update (pull and push) codes and documentation with a single click of button on a GC laptop from an RStudio. The tutorial on how to do it is developed. The ‘r4gc’ group has been created wihin GCcodes, where the codes, tutorials and other resources are gathered. It contains three main folders:

  • /codes. - This is where “raw” (not-reviewed, unedited) R codes contributed by GC community are uploaded. Currently, this includes codes for analyzing and visualizing PSES (Public Service Employee Survey) results, ATIP requests, COVID-19 statistics, and various codes for ease of day to day work and maintenance. Some codes are readily available to become packages, some are short code snippets taken from various blogs, question and answer portals, such as www.stackoverflow.org and www.rseek.org, and open-source textbooks.

  • /gc-packages. - This is where the work on packages being developed from the submitted “raw” codes is happening. Currently it includes repositories for building packages to process PSES results, COVID-19 data, and the utility functions package for data engineering and efficient data processing.

  • /resources. - This where the rest of knowledge-base is gathered, including the tutorials, slides, and codes presented at the community weekly ‘Lunch and Learn’ meetups.

GCcollab: R4GC (Use R)

URL: https://gccollab.ca/groups/profile/7391537/enuse-rfruse-r

GCcollab allows one to participate in the discussion from within and outside GC network (for registered users). This makes it convenient for gathering information from any sources, including those that my not be available from within the GC network. In order to facilitate the curation of knowledge, a number of discussion threads have been created there to address of topics of highest interest for the R4GC community. These are reviewed and updated regularly, commonly as part of community weekly meetups.

GCcollab Subgroup: Lunch and Learn R

URL: https://gccollab.ca/groups/profile/7855030/enfridays-lunch-and-learn-r-meet-upsfr

This subgroup is created for sharing minutes, notes and video-recordings from the community weekly meetups,

GCwiki: UseR!

URL: https://wiki.gccollab.ca/UseR!

This platform is used to consolidate all discussion topics in one place and link them with other data science resources in the wiki space.

GitHub: open-canada

URL: https://open-canada.github.io/UseR/

Inline with the GC policies of open science and open data , since most information gathered by the R4GC community is unclassified and comes from public domain, a public facing organizational account has been created on GitHub (https://github.com/open-canada) for sharing and growing the R4GC community knowledgebase. This is where public-facing community outputs are gathered, including the growing collection of Web Apps and codes that were built with contributions from GC data scientists using open source tools and data.

GitHub (Apps repository)

https://github.com/open-canada/Apps

Slido (r4gc)

URL: https://sli.do. Code “r4gc”

This platform is used for wider audience forums and conferences. A number of polls have been conducted there, in a addition to opening and ranking new questions.

Book structure

The book is organized in several parts.

Part I is dedicated to General discussions, which includes the following:

1 Why R? 2 R and Python 3 Best way to learn R 4 Other resources and bookdown textbooks 5 Events and Forums for R users

Part II is dedicated to the Best Practices and Efficient Coding in R and includes the following:

6 From Excel to R 7 data.table: your best fRiend 8 Reading various kinds of data in R 9 Efficient programming in R

Part III is dedicated to Visualization and Reporting and includes the following:

10 R Markdown for literate programming and automated reports 11 ggplot2 and its extensions for data visualization 12 Shiny for Interactive Data Visualization, Analysis and Web App development 13 Using R with GC data infrastructure (gcdocs, AWS, etc) 14 Interactive Outputs in R: ‘plotly’, ‘Datatable’, ‘reactable’

Part IV is dedicated advanced use of Data Science, Machine Learning, and AI. It includes:

15 Record Linking and other Data Engineering tasks in R 16 Geo/Spatial coding and visualization in R 17 Text Analysis in R 18 Machine Learning and Modeling in R

Part V contains the Tutorials developed by and for the community. This are:

19 GCCode 101 20 Packages 101 21 R101: Building COVID-19 Tracker App from scratch

22.1 Geo/Spatial coding and visualization with R. Part 1: 22.2 Text Analysis with R. Part 1:

and a number of short "How To: tutorials such as:

22.3 Dual Coding - Python and R unite ! 22.4 Working with ggtables 22.5 Automate common look and feel of your ggplot graphs and others

Part VI presents the outputs of the community development such as Shiny Web Apps and other codes and libraries developed by community members.

Finally, Appendix includes schedule and agendas for the R4GC community “Lunch and Learn” meetups, Release Notes, plans and invitation for collaboration, in particular to make this knowledgebase bilingual - in support of bilingualism in Canada and as an another opportunity for applying data science skills for public good.

About this book

This is book is built using the bookdown R package in RStudio. It is hosted at Open Canada GitHub repo https://open-canada.github.io/r4gc. The source code of it located at https://github.com/open-canada/r4gc.

Thus built, the book enables easy collaboration, transparency and peer-reviewing. Additionally, as with any markdown file, in addtion to the output visible in the final compiled html file, it also allows one to gather and save discussion comments and draft ideas that are visible inside the source Rmd file.

Other formats

One of the beauties of the bookdown R package is that it allows one to generate multiple kinds of outputs from the same source .Rmd files, such as pdf, epub, and various hmtl formats (single page html, multiple-page two-column and multiple-page three-column formats, with and without interactive menus) The alternative formats of this R4GC book are available here

How to contribute

Any chapter of this book can be edited by simply clicking on the “edit” button, which will lead to the corresponding source Rmd file in the book’s repo, where you can make a change in the document (in doing so, this repo will forked at your github account ) and submit ir to the book editor (by submitting the merge request). Alternatively, you can always contact the R4GC group lead at the contact listed below and attend R4GC weekly meetups.

About authors

Dmitry Gorodnichy is Research Data Scientist with the Chief Data Office at the Canadian Border Agency.

Patrick Little is Advisor on the Open Government systems team at the Treasury Board of Canada Secretariat.

Contacts and feedback

For questions and feedback, please contact the group lead and moderator Dmitry Gorodnichy: or

I General discussions

Why R?

R is one of the fastest growing programming languages and environment for data science, visualization and processing. R integrates well with Python and other free and commercial Data Science tools. It can also do something that other tools cannot and is very well supported by growing international community.

It also comes with RStudio - a free Integrated Development Environment (IDE) that is now supported by most GC Agencies and Departments, and that is becoming one of the main tools for data related problems worldwide. For Microsoft data users, anything you do with Excel, Access, Power BI, you can also do with R and RStudio.

10 main reasons to use R for Data Science

There was much discussion around this topic, and below are our own top 10 reasons to use R for Data Science - filtered and refined by the members of this community.

  1. Advanced graphics with ggplot2 and its extensions
  2. Automated report/tutorials/textbooks generation with RMarkdown
  3. Streamlined package development with devtools
  4. Streamlined Interactive interfaces and dashboards development and deployment with Shiny
  5. “Best for geocomputation”*
  6. Common tidy design shared across packages
  7. Curated peer-tested repo of packages at CRAN
  8. RStudio IDE (Integrated Development Environment) on desktop and cloud (rstudio.cloud)
  9. Full support and inter-operability with Python from the same IDE
  10. Global RStudio**-led movement for R education and advancement (rstudio.com)

R vs. Python

Main references:

The latter has also a nice summary of PROs vs. CONs for both languages, subjectively summarized by myself below:

Python’s PROs:

  • Object-oriented language (this is a big one for me)
  • General Purpose , i.e. you can use elsewhere (e.g. with raspberry pi - www.raspberrypi.org, as I did for one of my daughters’ projects)
  • Simple and easy to understand and learn (this could be subjective, but I would generally agree that R is not taught they way I personally would teach it, i.e. based on computer science principles, rather than on memorizing a collection of various heuristic tools and functions.)
  • Efficient (fast) packages for advanced machine learning activities, e.g. tensorflow or keras (which you can use from R too, but it could be less efficient), also a large collection of audio manipulation / recognition (some of whicg I played with for one of my biometrics projects)

Python CONs:

  • Not really designed for advanced manipulation or visualization of data

R’s PRO’s:

  • It is designed specifically for data tasks
  • CRAN provides >10K peer-tested (!) packages for any data task you may think of . (I would also add - It also provides much great tools to get you OWN work be tested and featured one day in CRAN - which is what we are doing now at our Learn R Meet-ups!)
  • ggplot2 graphics is unbeatable
  • Capable of standalone analyses with built-in packages. (Anyone knows what it means??)

R’s CON’s:

  • Not the fastest or memory efficient (Hey- That’s without data.table package! I don’t think so,if you use data.table,properly)
  • Object-oriented programming is not native or as easy in R, compared to Python. - Actually, I added this one. To me, this is one of main motivations to see how to use both Python and R, instead of R by itself. But for someone else, this could be not tie-breaker though

Python and R unite!

Check out our latest Lunch and Learn Meetup : ‘Lunch and Learn’ Meetup: Dual Coding - Python and R unite! : GCcollab

It shows how to Execute R from Python and Vice Versa, inspired by this blog: this https://www.kdnuggets.com/2015/10/integrating-python-r-executing-part2.html

The used code is available from GCCode: https://gccode.ssc-spc.gc.ca/r4gc/resources/howto

Other resources

Best way to learn R

The number of resources and ways to learn R is enormous. Some of us had tried many of them until we found the ones that we believe are the best ones.

Here are some: https://ivi-m.github.io/R-Ottawa/resources.html

And of course, don’t ever shy to ask questions (or seek for already many answers) at https://stackoverflow.com/. In fact, what a great option for you to save all the knowledgebase you acquire !! Helping yourself, you also help others, and contribute to further improvement of many R packages !)

Great bookdown textbooks

This book is prepared using bookdown R package.

It is inspired by many other
open source codes and examples. Below are listed those of particular value in developing this book.

See also textbook reference for each of the community discussion topics later in the book.

Events and Forums for R users

Other R communities in GC

Roughly sorted by the level of group activity

Other groups

RStudio-Enterprise-Community-Meetup

Another great meetup opportunity:

https://www.meetup.com/RStudio-Enterprise-Community-Meetup

NB: you may have to access it from your personal device

Each meetup is dedicated to a specific topic, following which they post many Q&A at https://community.rstudio.com/

R conferences

(from https://rviews.rstudio.com/2021/03/03/2021-r-conferences)

Cascadia RConf 2021 (June 4 - 5), a jewel of a regional R conference for its first three years, was canceled in 2020. It is back this year as a virtual event. The Call for Presentations is open.

useR! 2021 (July 5 - 9) has an outstanding lineup of keynote speakers. The program is very likely to make US based attendees night-owls

EARL Conference 2021 (September 6 - 10), the premier R in industry event, will be online this year. The call for abstracts is already open.

Other venues for submitting your R / Data Science work:

https://peerj.com/articles/?q=data rpubs.com

https://earlconf.com/#about: The Enterprise Applications of the R Language Conference (EARL)

https://odsc.com/boston/: Data Science Conference & Expo

https://journals.plos.org/plosone/search?filterJournals=PLoSONE&q=data+science (research by the US Department of Homeland Security is published there)

See also: from http://charteredabs.org/academic-journal-guide-2015-view/ (Rank 4-3)

Management Science IEEE Transactions on Systems, Man, and Cybernetics: Systems (formerly “IEEE Transactions on Systems, Man and Cybernetics - Part A: Systems and Humans”)

Annals of Operations Research

Operations Research

IEEE Transactions on Cybernetics (formerly “IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)”)

INFORMS Journal on Computing OR Spectrum

Transportation Science

Computational Statistics & Data Analysis

Academic Conferences:

Canadian AI Conference - https://www.caiac.ca/en/conferences Academic Journals:

Operations Research (Informs): https://pubsonline.informs.org/journal/opre https://www.scirp.org/journal/jdaip/ - Journal of Data Analysis and Information Processing

IEEE Conferences:

IEEE BigData IEEE International Conference on Technologies for Homeland Security HST

II Art of efficient R coding

From Excel to R

There was a keen interest expressed at last Friday meetup on transitioning from Excel to R. Incidentally, there was an RStudio Community Meet-up focused exactly on this topic: Meetup: Making the Shift from Excel to R: Perspectives from the back-office

Many Q&A from this meetup are posted here:

https://community.rstudio.com/t/meetup-making-the-shift-from-excel-to-r-perspectives-from-the-back-office/100467

This dictionary of Excel/R equivalent was very useful is finding a starting point for common functions: https://paulvanderlaken.com/2018/07/31/transitioning-from-excel-to-r-dictionary-of-common-functions/

For people who are looking for something more comprehensive, this website is very useful: https://rstudio-conf-2020.github.io/r-for-excel/

Pour ma part, j’adhère à l’approche proposée par The Carpentries, soit l’utilisation d’Excel pour la saisie de données et une partie du contrôle de la qualité. Par la suite, les données sont exportées en csv pour être importées dans R.

data.table: your best fRiend

The data.table package developed by Matt Dowle is a game changer for many data scientists

Learn about it, Share your favourite data.table trick here

https://github.com/Rdatatable/data.table

https://github.com/chuvanan/rdatatable-cookbook

http://r-datatable.com (https://rdatatable.gitlab.io/data.table/) https://www.datacamp.com/courses/time-series-with-datatable-in-r https://www.datacamp.com/courses/data-manipulation-in-r-with-datatable https://github.com/Rdatatable/data.table/wiki/Articles https://rpubs.com/josemz/SDbf - Making .SD your best friend

data.table vs. dplyr

data.table (Computer language) way vs. dplyr (“English language”) way

  1. The best: No wasted computations .No new memory allocations. dtLocations %>% .[WLOC == 4313, WLOC:=4312]
  2. No new memory allocations, but computations are done with ALL rows. dtLocations %>% .[, WLOC:=ifelse(WLOC==4313, 4312, WLOC)]
  3. The worst: Computations are done with ALL rows. Furthermore, the entire data is copied from one memory location to another. (Imagine if your data as in 1 million of cells, of which only 10 needs to be changed !) dtLocations <- dtLocations %>% mutate(WLOC=ifelse(WLOC==4313, 4312, WLOC)) NB: dtLocations %>% . [] is the same as dtLocations[]. so you can use it in pipes.

Conclusion: Use data.table for speed and efficient coding instead of dplyr (i.e.tibbles)!

Extensions of data.table

There’s considerable effort to marry data.table package with dplyr package. Here are notable ones:

Reading various kinds of data in R

vroom

My favourite methods for reading / writing “regular” .csv files has been ‘data.table::fread() / fwrite()’ - the fastest and automated in many ways. Now there’s another one - with package ‘vroom’ - https://cran.r-project.org/web/packages/vroom/vignettes/benchmarks.html

Then, of course, there are other kinds of data you want to read - efficiently (meaning, automatically and fast):

  • bad data, badly formatted data, sparse data,

  • distributed “big” data

  • just very large and very very large

  • from MS excel, MS Words

  • from clouds: AWS, MS Azure etc

  • from pdf, html

  • from zip files

  • from google docs, google sheets

  • from GCdocs and from other GC platforms (that was one of the questions at our Friday’s R meetup), and,

  • and finally, from all other IoT and web-crawling

readxl and xlsx

For reading Excel files, I used so far readxl. I would like nevertheless to be able to import a set of columns formed by non-contiguous columns of a sheet (something possible to select in the newer versions of Excel using data queries).

For writing Excel files, I used xlsx, as definitely, I need to be able to write multiple sheets in a file.

Discussion

The analyst should, actually, never stick to one solution but rather adapt to the needs of the project. For every format that is good and efficient with big data, you gain either 1) get manipulation overhead that does not make sense when manipulating small datasets, and they can end up slower than even dataframes in that case on small data but hundreds of times faster in big data (e.g. feather), or 2) need to wait forever and lose storage space for nothing (parquet) if the data is not big enough. Yet, if you found the right solution for every size and need, it will make a world of difference.

The example below does a comparison of some popular formats when used with Pandas (Python). You will get similar results if you try the same experiment in R. https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d

One of the options that I recommend, if your are only playing locally and not in the cloud, is using the feather format with sql.

If you need to extract data from a database and do more advanced data engineering without loading data in your RAM, you need SQL to prepare the extraction and do basic to advanced manipulation (SQL is Turing-complete, eh).

For more advanced and permanent transformations to the data, you need stored procedures (SQL again).

And if you play in the cloud, this is even more essential. For example, in AWS, you can create user-defined functions in Redshift using Python and Postgres SQL, but not R. All manipulation needs to be done in SQL, and Python can be used for other purposes such as calculations and fuzzy matching.

You can still use R in the rebranded Jupyter notebooks (Sagemaker in AWS, Azure Notebooks in Azure), but R is not as widely compatible in other cloud applications as SQL and Python. - [ PD: But you can absolutely use R in AWS for ETL. In fact you could even set up API endpoints via plumbr, there’s a whole AWS tutorial that deals with this issue]

References:

https://github.com/pingles/redshift-r/ Provides a few functions to make it easier to access Amazon’s Redshift service from R. http://www.rforge.net/RJDBC/index.html install.packages(“RJDBC”,dep=TRUE) RJDBC is a package implementing DBI in R on the basis of JDBC. This allows the use of any DBMS in R through the JDBC interface. The only requirement is working Java and a JDBC driver for the database engine to be accessed. feather (for larger than gigb): https://blog.rstudio.com/2016/03/29/feather/ parquet ( for verrrrry large files) https://campus.datacamp.com/courses/introduction-to-spark-with-sparklyr-in-r/case-study-learning-to-be-a-machine-running-machine-learning-models-on-spark?ex=4 Conclusions: As a side note on size, speed, and performance : it all depends on what you do, the delays, and the cost.

For example, if you use the cloud:

  •    If your data is going to be queried very often, so you have large volumes of data that would be scanned, move your processing to a runtime-billed tool (e.g. Redshift in AWS) rather than a data-billed tool (e.g. Athena in AWS). Otherwise, your cost may increase exponentially if users can survey data freely from, say, Tableau dashboards without caring for the actual amount of data that is queried. So if the data is queried 24/24h, your cost is stable and won’t increase with volume.
  •    If you may scan large volumes once or twice a day, then you would have to compare the costing options.
  •    If the costing model is incremental by runtime and you have very large amounts of data that you need to query quickly, then it would be best to use columnar formatted tables such as parquet. There is a cost and delay involved for the conversion, and you need much more storage because of the flattened structure, so storage will be more expensive (especially considering that you clone your original data and use at least twice the space then). However, queries will fly, and the cost of computation will be much smaller thereafter.
  •    For occasional queries, a data-billed tool would likely be the best option.

If you want to prototype with small datasets, do not lose time with parquet… CSV is the worst format after Excel files (which need to be unpacked and repacked), in any scenario, but the time investment in time to convert data is not worth it at all. Data.table and DT will be your best friends in R.

As for using SQL vs packages such as DPLYR, I mentioned a gain in performance, but be careful. If you use raw SQL, then you will see a big gain in performance. However, there are packages out there that translate SQL to R or Python interpretable code, and those will possibly be slower due to the interpretation layer. DPLYR, on the other hand, is quite efficient and well optimized. As usual, it depends on the packages. In R, the sqldf package should be good, if you want to try it out.

Efficient programming in R

TBA

Debugging

TBA

RStudio tricks

Coding online

TBA

Running multiple RStudio versions

You want to be able to run multiple versions of RStudio in Windows? You can do with the following executable .bat script.

# Run-RStudio-1.4.bat

@echo off
title Starting RStudio!
echo Hello Dear,
echo Starting RStudio 1.4 (from C:\Users\gxd006\Downloads\RStudio-1.4.1106\bin\) for you...

C:\Users\abc123\Downloads\RStudio-1.4.1106\bin\rstudio.exe

Set WshShell = CreateObject("WScript.Shell")
WshShell.Run chr(34) & "C:\Users\gxd006\Downloads" & chr(34), 0
Set WshShell = Nothing

III Visualization and Reporting

R Markdown for literate programming and automated reports

This discussion thread is for gathering knowledgebase related to R Markdown: https://rmarkdown.rstudio.com/

It can be used to generate reports, slides, websites, dashboards, shiny app, books, emails. It is the tools allows you to do literate programming, which – as defined by Donald Knuth - is the type of programming where human language is combined with computer language, making the code much easier understood by your colleagues and yourself and coding much more fun.

Automated generation of multiple PDF files

Question: “Can we automatically generate multiple PDF files using R RMarkdown? E.g. 20 report cards in PDF format, each showing results for 50 different parameters of the analysis?”

Answer: “Absolutely!” - This is what RMarkdown is developed for.

Here’s a tiny R script example showing how to do it.

Source: - https://stackoverflow.com/questions/67739377/passing-multiple-parameters-in-rmarkdown-document - https://stackoverflow.com/questions/60203785/using-multiple-params-in-rmarkdown-yaml-fields

ggplot2 and its extensions for data visualization

Many come to R (from Python and other languages/ systems) mainly because of the advanced data visualization capabilities it offers. There are many of those, as are the resources. Share your recommendations and examples here.

Resources

See: https://github.com/IVI-M/R-Ottawa/blob/master/resources.md

Introduction to ggplot2

How to customize ggplot2

Top 50 ggplot2 visualizations

ggplot2 quick reference sheet

ggplot2 quick reference for colour names

Modifying components of theme() in ggplot2

Colours in ggplot2, including colour-blind friendly palette

Package for colour manipulation, including visualizing the effects of colour-blindness on colour palettes. https://cran.r-project.org/package=prismatic

Dmitry Gorodnichy’s avatar By Dmitry Gorodnichy 2021-04-08 22:15Edit the discussion reply by Dmitry GorodnichyDelete the discussion reply by Dmitry GorodnichyLike the discussion reply by Dmitry Gorodnichy bookdown.org books on advanced visualization with ggplot and its extensions

https://serialmentor.com/dataviz/ - https://github.com/clauswilke/dataviz. Fundamentals of Data Visualization. Claus O. Wilke https://github.com/jjallaire/dataviz-r

OpenSDP Dataviz Tutorial (R)

Plotly + R Shiny

https://socviz.co/ - Data Visualization. A practical introduction https://r-graphics.org/ R Graphics Cookbook, 2nd edition. Winston Chang. 2019-12-19 with Plotly in R

Intro to Animations | R | Plotly - https://plotly.com/r/animations/ https://plotly-r.com/index.html https://plot.ly/r/animations/

Shiny for Interactive Data Visualization, Analysis and Web App development

This discussion thread is dedicated to Shiny package - a RStudio-curated tool for developing and deploying Interactive Data Visualization and Analysis tools and applications.

Using R with GC data infastructure (gcdocs, AWS, etc)

gcdocs

Q:

I’m wondering if anyone has had any success accessing data from GCDocs in your respective departments ? I believe that GCDocs is implemented across most (if not all) departments so wondering if there are any existing solutions to read/write data from it.

Also wondering about whether any of you have had any luck accessing Microsoft 365 via R as well? I’ve had success with Microsoft365R package (https://github.com/Azure/Microsoft365R) from a personal point of view but it doesn’t play well (at least not in my department - ISED) with business accounts.

A: I tried using the Microsoft365R package to access my departmental (ECCC) email without success. When I tried to access it, a window popped up allowing me to request access authorization so I clicked the “Submit” (or whatever) button. That was weeks ago. Never heard anything more about it.

After trying many ways and weeks, I found this is NOT possible. - It runs internal code on gcdoc end that validates that you have right to access it and then, if you do, it also logs your action within the document’s “Audit” attribute. So we still have to always make a local copy of the data (manually!), and only then we can process it from R.

Interactive Outputs in R: plotly, Datatable, reactable

This discussion is dedicated to tools to generate interactive graphs, tables, and other content without Shiny (which, as you know, requires a server to host your Shiny application and which for this reason cannot be easily shared with your clients, e.g. by email)

Here are the most popular ones:

https://rstudio.github.io/DT/
https://plotly.com/r/ and
https://glin.github.io/reactable

reactable

Check what kind of interactive coloured tables you can make with it: https://glin.github.io/reactable/articles/womens-world-cup/womens-world-cup.html

IV Machine Learning and AI

Record Linking and other Data Engineering tasks in R

demo

See http://rcanada.shinyapps.io/demo and the #GCData2021 Data Engineering workshop presentation - for the backgrounder and the demonstration of various DE tasks and solutions.

Geo/Spatial coding and visualization in R

Resources

There’s much effort across many GC departments to analyze and visualize geo-data. This discussion is the place to share your results, ideas or problems related to the problem.

Below is a great resource to start, which also provides a nice explanation on why R is believed to be the best language to do this kind of work.

Geocomputation with R, a book on geographic data analysis, visualization and modeling.

The online version of the book is hosted at https://geocompr.robinlovelace.net and kept up-to-date by GitHub Actions

Federal Geospatial Platform

From https://gcconnex.gc.ca/discussion/view/84695812/data-federation-federation-des-donnees

Yukon’s open geospatial data is now searchable on the Federal Geospatial Platform (FGP) (https://gcgeo.gc.ca/) and on Government of Canada Open Maps Portal. Check it out !

With the addition of Yukon’s resources, the FGP now offers over 5,000 datasets available to discover and download, for public servants and Canadians, all in one location.

Wondering how to create a map with your data? We can help you do so, from A to Z, at no cost. Contact us at

Tutorials

Check codes and notes for the two Tutorials that we had on this subject this summer:

https://gccode.ssc-spc.gc.ca/r4gc/resources/introSpatialAnalysis

Dealing with memory issues

A blog post that illustrates a few ways to avoid overloading R’s memory when working with large spatial objects (here’s looking at you, 30-m land cover map of North America!).

https://www.ecologi.st/post/big-spatial-data/

The two other posts on that blog also have some really nice tips for general R coding.

Canadian geo-data

Useful code and R packages from public domain to work with Canadian geo-data.

From https://mountainmath.ca - https://github.com/mountainMath/mountainmathHelpers - tongfen: Convenience functions for making data on different geometries, especially Canadian census geometries, comparable. - cancensus : R wrapper for calling CensusMapper APIs - cansim: Wrapper to access CANSIM data - CanCovidData: Collection of data import and processing functions focused on Canadian data

Text Analysis in R

https://gccollab.ca/discussion/view/7404441/text-analysis-in-r

Good place to start:

https://smltar.com/ https://www.tidytextmining.com/ https://slcladal.github.io/topicmodels.html, https://slcladal.github.io/textanalysis.html

Also, a list of related resources and codes for text mining (including Web Scraping) on github:https://github.com/gorodnichy/LA-R-text

Plagiarism detection

Q: Any ideas/packages/resources (in R) for plagiarism detection?

A: A good place to start is the “stylo” package (https://github.com/computationalstylistics/stylo - R package for stylometric analyses) which implements a wide variety of recent research in computational stylistics. Plagiarism detection is fraught (insert all of the usual ethical and computational caveats…), but stylo can help you identify passages that are stylistically unusual compared to the rest of the text. Unusualness definitely isn’t a proxy for plagiarism, but it’s a good place to start.

Q: Is this focused on English language text? Are there lexicons or libraries for comparison within other languages (e.g., French)?

A: Stylo works well with quite a few non-English languages. French, for example, is supported, as are a number of languages with non-Latin alphabets like Arabic and Korean.

results from International Methodology Symposium

Two presentations at International Methodology Symposium were about Text Analysis with R, with great ideas from both:

11B-4 by Andrew Stelmach from StatCan: used library(fastText) - a very powerful package from facebook AI team for efficient learning of word representations and sentence classification.

11B-2 by Dave Campbell from Carleton U: used the approach that we discussed at L&L on October 9 (based on bag of words cosine distance / correlation) for matching beer products description - https://gccode.ssc-spc.gc.ca/r4gc/resources/text/), but addtionally applied SVD (singular value decomposition) to reduce comparison to the most imporant words, thus reducing significantly the dimension and speed.

You can find their decks here: https://drive.google.com/drive/folders/1TfuNmG3V8IEKDNNTcMZz7_YCKgqVVBju

Machine Learning and Modeling in R

Resources

Below are most popular textbooks for Machine Learning and Modeling in R.

This one is used in many University Machine Learning courses (e.g. in University of Arizona and many others in USA):

An Introduction to Statistical Learning with Applications in R

https://www.statlearning.com/ ( https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf )

These are on Tidy models (most come with source codes): https://www.tidymodels.org/books/

https://www.tmwr.org/ - TIDY MODELS WITH R

https://bookdown.org/max/FES - Feature Engineering & Selection

https://moderndive.com/. Statistical Inference via Data Science

Deep learning with Tensorflow

We can also have a separate discussion thread on Deep Learning with Tensorflow and Keras inR (https://tensorflow.rstudio.com/)

Also see Discussion on Text Mining (https://smltar.com/ and https://www.tidytextmining.com/)

V Community Tutorials

R4GC community “Lunch and Learn Data Science with R” weekly meetups : (NB: you need to join the “Lunch and Learn Data Science with R” meetups group to access recordings of these sessions)

GCCode 101

title: “gccode101: working with GCCode”
subtitle: (With focus on how to do it using RStudio)
date: September 2020

The information presented here includes only open public domain knowledge and does not include specific details related to operation of each GC department. Full tutorial and relatred Q&A are available at https://gccode.ssc-spc.gc.ca/r4gc/resources/gccode101

TL;DR

  1. [One time action per life time] Make sure you have RStudio and git installed on your local machine: eg from Anaconda - Ask your IT to help you.

  2. [One time action per project] Go to GCCollab and create a new repo there (it can be left empty or just add README.md there) or select the existing one where you want to work on: eg https://gccode.ssc-spc.gc.ca/r4gc/gc-packages/packages101

  3. Generate Access Token (from Setting in left panel) or request one for a repo of which you are not the owner. It will look something like this: LNwVUF5YGnF-6x5fsnJ-

  4. Open Windows PowerShell (or cmd), go to directory where you want to clone your repo (eg. cd C:_CODES_packages) and run this command: git clone --progress https://oauth2:LNwVUF5YGnF-6x5fsnJ-@gccode.ssc-spc.gc.ca/r4gc/gc-packages/packages101 gc-packages101. Close it - you wont need it again! You can check - your new directory contains .git/ folder ! (This is where your credentials for GCCollab are stored )

  5. Open RStudio and create New project there. You have two options. In both cases, You’ll see the GIT button on top, once you finish and reload your project with RStudio, which means you are all set now and can start modifying / building your code!

  • Option A (To build a new R package): Choose New project -> New Directory -> New Project (with Name of your package, eg caPSES). Leave "Create a git directory box UNCHECKED. Once it is created, MOVE the entire content of this new package folder to the directory that you cloned in previous step (which contains .git/ folder), or vise versa just move .git/ folder from the cloned directory to your new package directory)

  • Option B (for any other project): Choose New project -> Existing directory -> point to your new cloned directory. That’s it.

  1. [Every time you make changes in project]
  2. We recommend to always pull first (to avoid conflicts later)- from Git menu button
  3. Make changes, open commit window, describe them, commit them, push them back to repo
  4. Enjoy the rest of your day !

More details from our fRiday’s Lunch and Learn discussions follow below.

Step 00: Connecting to GCCode and installing required soft.

“Connecting” means being able to pull a repository to your local (gc-network connected) machine, modify code there (in RStudio is the easiest), commit your changes, and pull it back to GCCode. Before doing that you need the following programs installed on your local (gc-network connected) machine,

Below (*) indicates options that have been tested as most efficient

  • R:

    • from Anaconda
    • from CRAN*: R version 4.0.2 (2020-06-22): As of Sept 2020, we have right to install packages directly from CRAN. So you can do library(installr); updateR()
  • RStudio:

    • From Anaconda Prompt in new environment*: conda create -n e2020.03.02-markdown_issues mro-base rstudio (replace mro-base rstudio(replacee2020.03.02-markdown_issueswithYOUR_NEW_ENVIRONMENT_NAME`) (This will create a shortcut (R) on desktop). - old: Version 1.1.456 – © 2009-2018 RStudio, Inc.
  • Git:

    • From Anaconda Prompt*: conda install git (or you can install in new environment e2020.03.02-markdown_issues)
    • From Anaconda Prompt: conda install m2-git (I installed in new environment e2020.07.21_mintty)
      • Q: where is git.exe actually located in c: drive ? cd \ .
      • A: C:.21_mintty
    • Check with IT - from web: https://git-scm.com/download/win (Git for Windows Portable (“thumbdrive edition”) - 64-bit Git for Windows Portable. NB: it’s not accessible from CBSA network)
  • Command windows (where you can run git commands):

    • Windows cmd (it may or may not work there for your machine)
    • Anaconda prompt: this does not take all shell commands)
    • mintty* ( from www.msys2.org), install: conda install m2-base (creates ~/anaconda3/Library/mingw64 directory, git.exe installed with m2-git will be placed in /bin there).
      • You can then run it from conda terminal: mintty
      • Run which git (linux style from mintty) or where git(windows style from mintty) to find out path to git.exe
      • Note how Linux-like file system is mapped to Windows’ one below:
        • $ where git: C:.exe
        • $ which git: /usr/bin/git
    • Windows PowerShell: works well but does not take all linux commands , e.g it does not know which/where, ls -a)
    • RStudio Terminal ** : it runs from git in your active environment from which you started RStudio. This allows to use different git.exe settings or executables.
  • There are additional recommended packages for efficient source control and collaborative code development, which however can be learnt later, such as :

    • renv
    • docker
    • devtools

Step 0: Configuring Windows, Git and GitLab (tokens)

1: Edit environment variable for your account (Search “env”) and set HOME to /c/users/gxd006/ (replace gxd006 with your user id). - This will become your home directory ~ for mintty and this is where your .gitconfig file will reside for git used in mintty! - needed for next step below

  • Test it: $ echo $HOME: /c/Users/gxd006/

2: Edit .gitconfig file as follows (note, it maybe invisable to your OS. The easiest way to open/edit it is using RStudio. )

Alternatively, you can view/edit using build-in vim editor: vim .gitconfig or by running git config --global -e. - Three vim main commands: - ESC a: (insert text after cursor), - ESC: wq (save and exit), - ESC: q! (dont save and exit)

NB: if you run terminal from RStudio (rather than from mintty), then git config --global -e will open .gitconfig file from where it is found using where git

[user]
     name = Your Name
     email = your.email@cbsa-asfc.gc.ca
[http]
     proxy = "http://proxy.omega.dce-eir.net:8080"

NB: if you do not do that, you’ll be this getting error:

fatal: unable to access 'https://gccode.ssc-spc.gc.ca/super-koalas/shared-code/': Failed to connect to gccode.ssc-spc.gc.ca port 443: Timed out

3: Decide how/where you will organize your gccode on your machine. This is how I converged (after many iterations) to organize my projects - see file_structure.md

4: If you dont want to be typing your login id/password everytime you connect to GCCode (which I’m sure you dont:), read this article: https://knasmueller.net/gitlab-authenticate-using-access-token, and create your new personal access token there (which will look something like tjxrg3GyUQJJDMaA6LfHA)

Step 1: Find (or create) a GitLab project you want to contribute to.

Lets say, you want to contribute to this project: https://gccode.ssc-spc.gc.ca/r4gc/resources/gccode101

1.1: Go to ~/_CODES/GCCodes (this where you keep all your GCCodes projects),
and run from cmd terminal, or Anaconda prompt, or mintty:

CORRECTION: Q: Currently works from conda or mintty termminal only. How to make git calleable from Windows cmd? I changed PATH to add a directory to C:\Users\gxd006\anaconda3\Library\bin it did not help

(base) C:\Users\gxd006>which git
/usr/bin/git

vs.

H:\>git
'git' is not recognized as an internal or external command,
operable program or batch file.

From Anaconda prompt, or mintty:

git clone --progress https://gccode.ssc-spc.gc.ca/r4gc/resources/gccode101 r4gc_gccode101

or better (if you set a personal token - see Step 0.4 above)

git clone --progress https://oauth2:tjxrg3GyUQJJDMaA6LfHA@gccode.ssc-spc.gc.ca/r4gc/resources/gccode101 r4gc_gccode101

Now you can go to the created directory r4gc_gccode101 and do something there, either from command line or directly from RStudio.

1.2 Using Command Line

Push a new file
cd existing_folder
touch README.md
git add README.md
git commit -m "add README"
git push -u origin master

Push an existing folder
cd existing_folder
git init
git remote add origin https://gccode.ssc-spc.gc.ca/r4gc/resources/gccode101
git add .
git commit -m "Initial commit"
git push -u origin master

Push an existing Git repository
cd existing_repo
git remote rename origin old-origin
git remote add origin https://gccode.ssc-spc.gc.ca/r4gc/resources/gccode101
git push -u origin --all
git push -u origin --tags

1.2 In RStudio

Recommended - see below

Step 2: Using Branches (optional)

It is recommended that you create your own branch for every project where you want to contribute (e.g. I made branch ivi for myself) and do everything there. NB: you can also do it from RStudio or command line.

git fetch
git checkout ivi 
git checkout master
git init .
touch some.txt
git add some.txt
git commit
git log

git status
git push
  • For the rest of the presentation, we focus on using RStudio to do everything you need with GitLab in GCCode

Step 3: GCcoding from RStudio

  • Configure git: in Global options

  • Create new Project from existing directory (point to directory where you cloned GCCode repo)

You’ll note that GIT button is visible now there ! (that’s because it knows that this directory is cloned from gitlab)

  • Make some changes

  • Click on GIT button menu -> commit

  • Check on file(s) you want to commit, Describe you change, click Commit, click push - Voila! Done.

Packages 101

title: “How to convert your functions to package(s)”

subtitle: “GC Lunch and Learn: R packages 101” author: Dmitry Gorodnichy and Joseph Stinziano gitlab: https://gccode.ssc-spc.gc.ca/r4gc/gc-packages/packages101 date: “March -June 2021” Taken from: https://gccode.ssc-spc.gc.ca/r4gc/gc-packages/packages101/-/blob/master/packages101.Rmd

How to contribute

  1. Anyone: Fork Packages101 repo - modify - commit changes - push - submit request to merge

  2. Members of r4gc group:

A token is generated to allow you to push/pull Packages101 repo.
Use this line to push/pull from Packages101 repo:

git clone --progress https://oauth2:tjxrg3GyUQJJDMaA6LfHA@gccode.ssc-spc.gc.ca/r4gc/gc-packages/packages101 r4gc_packages101

=======

* R script to start:

library(devtools)

devtools::create("rCanada") ## if you are  in /my_package
devtools::create("../../rCanada") ## will it erase it,if I lready have it?
> #or usethis::create_package("../r4gc/packages/rCanada2")

library(roxygen2)
## library(testthis).<- we dont use this!Dont confuse it with library(testthat) ??

library(testthat) ## do we need it?

library(usethis)
## https://github.com/r-lib/usethis

usethis::use_testthat()
## use_testthat()

use_news_md()

use_test("iviDT")

x <- 1
y <- 2
use_data(x, y)


use_vignette(name="my_vignettes") #
use_vignette(name="data_linking")


use_package("data.table")
## > use_package("data.table")
## √ Adding 'data.table' to Imports field in DESCRIPTION
## * Refer to functions with `data.table::fun()`


  
  

use_package("magrittr")
use_package("data.table")
use_package("lubridate", type="Imports")  ## how about order ?! lubridate must be after data.table !
use_package("stringr")

use_package("IVIM")

## To update .Rd files in ./man, run:
devtools::document()
## Warning: The existing 'NAMESPACE' file was not generated by roxygen2, and will not be overwritten.
## So delete it, and then it will be created

To be discussed

Q: When I Clean and Rebuid: ????! ** testing if installed package can be loaded from final location

?????? C:/Users/gxd006/DOWNLO1/R-401.2 /etc/Renviron.site ?? May 12 19:25:09 Warning: replacing previous import ‘data.table::month’ by ‘lubridate::month’ when loading ‘IVIM’

* Setup

This is what you already have:
- “Original”" folder, with no specific structure, with R and Rmd files that contain: a) functions that that you want to be re-used by others (and yourself many months later!).
- They are tested and test codes are included in if(F) or in separate .Rmd, or, even better, in interactive Shiny App (eg https://rCanada.shinyapp.io/covid) - Good idea to be have them in form that can be sourced: source("caCovid.R"); source("iviBase.R")

  1. functions and other codes that are not (yet) ready for re-use.

https://gccode.ssc-spc.gc.ca/r4gc/codes/tracking-covid-data - caCovid.R - iviBase.R - … common.R, plot.R, etc

This is what you want to get: - Package folder (or several folders). - GCCode/r4gc/packages/rCanada - GCCode/r4gc/packages/IVI https://gccode.ssc-spc.gc.ca/gorodnichy/rCanada

Ways to do it:

You can create it in two ways:

From New Project

Either way RStudio will initialize and launch your new project. In the second way, it also puts .gitignore

  1. In RStudio -> New Project -> New Direcory ->R package -> package name: caPSES

NB: there are many templates. Choose basic (“R package) and create in folder /my_packages NB: If you have .R codes that are already source-able, attached them with”Add"button (one at a time), or you can copy them into /R folder later

note from Hadley: https://r-pkgs.org/workflows101.html

Call usethis::create_package(“path/to/package/pkgname”). Or, In RStudio, do File > New Project > New Directory > R Package. This ultimately calls usethis::create_package(), so really there’s just one way.

Don’t use package.skeleton() to create a package. Because this function comes with R, you might be tempted to use it, but it creates a package that immediately throws errors with R CMD build.

From script
  1. Run from any R, Rmd window or R console the command below

NB: output from

> devtools::create("../r4gc/packages/rCanada")
> #or usethis::create_package("../r4gc/packages/rCanada2")

New project 'rCanada' is nested inside an existing project '../r4gc/packages/', which is rarely a good idea.
If this is unexpected, the here package has a function, `here::dr_here()` that reveals why '../r4gc/packages/' is regarded as a project.
Do you want to create anyway?

1: Yes
2: Absolutely not
3: Nope

Selection: 1
√ Creating '../r4gc/packages/rCanada/'
√ Setting active project to 'C:/Users/gxd006/Downloads/_CODES/GCCode/r4gc/packages/rCanada'
√ Creating 'R/'
√ Writing 'DESCRIPTION'
Package: rCanada
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R (parsed):
    * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to
    pick a license
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
√ Writing 'NAMESPACE'
√ Writing 'rCanada.Rproj'
√ Adding '^rCanada\\.Rproj$' to '.Rbuildignore'
√ Adding '.Rproj.user' to '.gitignore'
√ Adding '^\\.Rproj\\.user$' to '.Rbuildignore'
√ Opening 'C:/Users/gxd006/Downloads/_CODES/GCCode/r4gc/packages/rCanada/' in new RStudio session
√ Setting active project to '<no active project>'

* Overall Workflow

.. Copy needed codes from MY_CODES to R directory

  • Copy a function
dt.replaceAwithB <- function(dt, col, a, b) {
  dt[get(col)==a, (col):=b]; 
}
  • Insert roxigen skeleton (from magic wand menu) and add description NB: you may need to manually insert @ import and @ export

no need to Load All? Always start from “Clean and rebuild” Then Check

* .Rbuildignore

^.*.Rproj$ ^.Rproj.user$ MY_CODES incorrect ^MY_CODES$ correct ^MY_DATASETS$ ^LICENSE.md$

* License: use_mit_license()

License: MIT + file LICENSE

* DESCRIPTION

* NAMESPACE

https://r-pkgs.org/namespace.html

use_package("data.table")
## > use_package("data.table")
## √ Adding 'data.table' to Imports field in DESCRIPTION
## * Refer to functions with `data.table::fun()`

However this does notchange NAMESPACE ! Who changes is it? is that devtools::document() ???

Generated by roxygen2: do not edit by hand

export()
export(addDerivatives)
export(extractMostInfectedToday)
export(readCovidUofT.csv)
importFrom(lubridate,dmy)
importFrom(stringr,str_replace)

I deleted NAMESPACE so that roxigen can generate it! then added manually there . Not sure that’s theway to do it !

import(data.table)
import(ggplot2)
import(lubridate)
import(magrittr)
import(IVIM)

It Worked !

If you are using just a few functions from another package, the recommended option is to note the package name in the Imports: field of the DESCRIPTION file and call the function(s) explicitly using ::, e.g., pkg::fun(). Alternatively, though no longer recommended due to its poorer readability, use @importFrom, e.g., @importFrom pgk fun, and call the function(s) without ::.

* Examples and tests

.. In main file in /R folder

#' @examples

Some could be such :

#' \dontrun{}

.. In /tests folder

> use_testthat()

√ Setting active project to 'C:/Users/gxd006/Downloads/_CODES/GCCode/r4gc/packages/IVIM'
√ Adding 'testthat' to Suggests field in DESCRIPTION
√ Setting Config/testthat/edition field in DESCRIPTION to '3'
√ Creating 'tests/testthat/'
√ Writing 'tests/testthat.R'
* Call `use_test()` to initialize a basic test file and open it for editing.
Warning messages:
1: In readLines(f, n) :
  incomplete final line found on 'C:/Users/gxd006/Downloads/_CODES/GCCode/r4gc/packages/IVIM/DESCRIPTION'
...
  incomplete final line found on 'C:/Users/gxd006/Downloads/_CODES/GCCode/r4gc/packages/IVIM/DESCRIPTION'
> use_vignette()
Error in check_vignette_name(name) : 
  argument "name" is missing, with no default

.. in MY_CODES

Provide as .R, Rmd or shiny

* Documentation

devtools::document()

https://kbroman.org/pkg_primer/pages/docs.html

#' For more details see the help vignette:
#' \code{vignette("help", package = "mypkg")}

or 

\href{../doc/help.html}{\code{vignette("help", package = "mypkg")}}

* Vignettes

> use_vignette(name="my_vignettes")

√ Adding 'knitr' to Suggests field in DESCRIPTION
√ Setting VignetteBuilder field in DESCRIPTION to 'knitr'
√ Adding 'inst/doc' to '.gitignore'
√ Creating 'vignettes/'
√ Adding '*.html', '*.R' to 'vignettes/.gitignore'
√ Adding 'rmarkdown' to Suggests field in DESCRIPTION
√ Writing 'vignettes/my_vignettes.Rmd'
* Modify 'vignettes/my_vignettes.Rmd'

I later renamed my first vignette to intro.Rmd - manually

https://bookdown.org/yihui/rmarkdown-cookbook/package-vignette.html

Delivering package

  • [Easiest way] Put binary in package repo. Then you can install it simply using install.packages("IVIM", repos = "https://gccode.ssc-spc.gc.ca/r4gc/gc-packages/IVIM/-/blob/master/versions/IVIM_0.0.0.9000.tar.gz")
    • this .tar ball is obtained by running devtools::build() or devtools::check() which will place it in “../yourRPackageProject” directory.
  • [CONFIRM THIS!] Put source in github. Then people will be able to install using devtools::install_github("https://gccode.ssc-spc.gc.ca/r4gc/gc-packages/IVIM")

Packaging and publishing w. pkgdown

https://fanwangecon.github.io/R4Econ/support/development/fs_packaging.pdf

Appendix: Various possible Errors and problems

Errors in using usethis::use_testthat()

Problem:


> usethis::use_vignette(name="howtoCovid")
Error in read.dcf(con) : 
  Line starting 'analyze and visualiz ...' is malformed!
> 

Solution:
https://stackoverflow.com/questions/59303030/devtools-error-in-read-dcfpath-desc-line-starting-this-corresponds-to

Corrections from https://gccode.ssc-spc.gc.ca/r4gc/gc-packages/IVIM/-/tree/useR-dev/

(optional) Rtools

NBL This is not required

https://cran.r-project.org/bin/windows/Rtools/

http://web.mit.edu/insong/www/pdf/rpackage_instructions.pdf

package_initialization_script.R


 ## package_initialization_script.R
 
 library(devtools)

devtools::create("rCanada") ## if you are  in /my_package
devtools::create("../../rCanada") ## will it erase it,if I lready have it?
> #or usethis::create_package("../r4gc/packages/rCanada2")

library(roxygen2)
## library(testthis). What's the difference from library(testthat) ??
library(testthat) ## do we need it?
## Attaching package: ‘testthat’
#
## The following object is masked from ‘package:devtools’:
#
##   test_file
#
## The following objects are masked from ‘package:magrittr’:
#
##   equals, is_less_than, not


usethis::use_testthat() ## from usethis, which is callsed from devtools


use_vignette(name="my_vignettes") #
use_vignette(name="data_linking")


use_package("data.table")
use_package("magrittr")
use_package("data.table")
use_package("lubridate", type="Imports")  ## how about order ?! lubridate must be after data.table !
use_package("stringr")

R101: Building COVID-19 Tracker App from scratch

The following page provides codes and video-recording for this tutorial:

https://open-canada.github.io/UseR/learn2020.html

You can see the final result here: https://itrack.shinyapps.io/covid/us.Rmd

Short tutorials

Geo/Spatial coding and visualization with R. Part 1:

Contributed by:

Text Analysis with R. Part 1:

Contributed by:

Dual Coding - Python and R unite !

Contributed by:

Working with ggtables

Contributed by:

Automate common look and feel of your ggplot graphs

Contributed by:

Automated generation of report cards

Contributed by:

VI Community Development

Shiny Apps

Source: https://open-canada.github.io/Apps/

These Applications have been built with contributions from data scientists across the Government of Canada, using open source tools and data, many as an outcome of the R4GC community training and socializing.

PSES Results interactive analysis and visualization:

URL: https://open-canada.github.io/Apps/pses Source: https://gccode.ssc-spc.gc.ca/r4gc/codes/pses

Automated ATIP requests topic extraction:

Source: https://gccode.ssc-spc.gc.ca/r4gc/codes/atip

Geo-mapped current, historical and predicted border wait times:

URL: https://open-canada.github.io/Apps/border (redirect to open.canada.ca).
Source: https://gccode.ssc-spc.gc.ca/gorodnichy/simborder

Demo and tool for linking noisy data records:

URL: https://rcanada.shinyapps.io/demo/ (presented at the #GC Data2021 Conference )
Source: https://gccode.ssc-spc.gc.ca/gorodnichy/iviLink

Appendices

Lunch and Learn series

This is the scheduled agenda for the Lunch and Learn ‘Data Science with R’ series organized by R4GC community. Please see Lunch and Learn page for details on how join us for these meetings

12 Nov - 19 Nov 2021: Text Analysis with R follow-up / Converting codes to Shiny App

8 Oct 2021: Text Analysis with R. Part 1: identifying near-duplicate documents

1 Oct 2021: Shiny App to summarize very large, high-dimensional tables (code & app provided)

30 Jul - 17 Sep 2021: Geo/Spatial coding and visualization with R. (code provided)

16 Jul 2021: Dual Coding - Python and R unite ! (code provided)

9 Jul 2021: Exploring ggplots (recording, code provided)

2 Jul 2021: Parsing GC Tables (code provided)

25 Jun 2021: Using the Open Government Portal API within R (recording, code on github.com/open-canada)

21 Apr 2021: Analyzing PSES results using R and Shiny

16 Apr - 15 May 2021: Building R packages (recording, codes provided)

Discussed RStudio Webinars

Webinars at RStudio - https://www.rstudio.com/resources/webinars/ (codes at https://github.com/rstudio/webinars) The new insights from these webinars and other R-related blogs and events are discussed at R4GC community meetups

Technical information

This book is written in R Markdown and R, with the use of ‘bookdown’ package, and compiled in R Studio.

Colophone

TBD

Release Notes

TBA

Français

Pour l’instant, ce livre est développé seulement en anglais. Cependant, plusieurs de commentaires et trucs qu’on partage dans notre communauté proviens également dans la langue française aussi.

On espère qu’un moment donné on peut d’avoir de ce livre dans le deux langues officielles: en anglais en français

On même peux se servir des outils automatisés et intelligence artificielle - car nous sommes les scientifiques de données, n’est pas ? - pour automatiser le traduction de contexte de ce livre!

Si vous êtes intéressé de contribuer à nos efforts de traduire ce livre dans le langue de votre choix, veuillez contacter Dmitry Gorodnichy.

knitr::knit_exit()